Rows: 571
Columns: 5
$ pm25 <dbl> 10.827805, 11.583928, 11.261996, 9.414423, 11.391494, 12.384…
$ fips <int> 1069, 1073, 1089, 1097, 1103, 1113, 1117, 1121, 1125, 1127, …
$ region <chr> "east", "east", "east", "east", "east", "east", "east", "eas…
$ longitude <dbl> -85.35039, -86.82805, -86.58823, -88.13967, -86.91892, -85.1…
$ latitude <dbl> 31.18973, 33.52787, 34.73079, 30.72226, 34.50702, 32.37600, …
The shape of the boxplot looks to be pretty symmetric, with the whiskers at around 5 and 15. There are multiple outliers above 15 and a few under 5. It seems that the Q1,Q2,Q3 are pretty evenly spaced.
Overall, the west has a wider range of air quality values. The east has overall higher value, but the west has a few outliers that set it above the east.
pm25 fips region longitude latitude
1 16.19452 6019 west -119.9035 36.63837
2 15.80378 6029 west -118.6833 35.29602
3 18.44073 6031 west -119.8113 36.15514
4 16.66180 6037 west -118.2342 34.08851
5 15.01573 6047 west -120.6741 37.24578
6 17.42905 6065 west -116.8036 33.78331
7 16.25190 6099 west -120.9588 37.61380
8 16.18358 6107 west -119.1661 36.23465
All the counties with air quality standard over 15 are in the west region. These are all counties with an fips code that starts with “6.” These are counties in California, which is a very western state.
The white dot in the middle of the black rectangle (IQR) represents the median, so the east overall has a higher median air quality than the west. The violin plot still shows that the range of the west is larger, but it does not show what values will be outliers as well as the boxplot does.
The histogram looks pretty symmetric but could be considered skewed right because of the values starting around air quality 13.
The shape of the east histogram is taller because there were more data values for this region, and the shape is pretty symmetric. The west’s histogram is shorter and skewed right.
Scatterplot of Air Quality vs Latitude by Region
There are more data points for the east and they’re closer to each other than the west points, which are relatively spaced out.
---
title: "Assignment6-MidtermDashboard"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: minty
navbar-bg: "purple"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
```
Air Quality Data
===
Column {data-width=250}
---
```{r}
PM<-read.csv("avgpm25.csv")
glimpse(PM)
datatable(PM)
```
Column {data-width=250}
---
Boxplot of Air Quality Data
===
```{r}
boxplot(PM$pm25,main="Distribution of Air Quality Values")
```
### Analysis
The shape of the boxplot looks to be pretty symmetric, with the whiskers at around 5 and 15. There are multiple outliers above 15 and a few under 5. It seems that the Q1,Q2,Q3 are pretty evenly spaced.
Boxplot of Air Quality Values by Region
===
```{r}
boxplot(PM$pm25~PM$region,main="Distribution of Air Quality Values by Region",xlab="Region",ylab="Air Quality",col="pink")
```
### Analysis
Overall, the west has a wider range of air quality values. The east has overall higher value, but the west has a few outliers that set it above the east.
Qualities Over 15
===
Column {data-width=450}
---
```{r}
Fifteen<-filter(PM,pm25>15)
Fifteen
datatable(Fifteen)
```
### Analysis
All the counties with air quality standard over 15 are in the west region. These are all counties with an fips code that starts with "6." These are counties in California, which is a very western state.
Violin Plot of Data
===
Column {data-width=250}
---
```{r}
#install.packages("vioplot")
#vioplot::vioplot(PM$pm25~PM$region,main="Violin Plot of the Distribution of Air Quality by Region",xlab="Region",ylab="Air Quality",col="pink")
```
### Analysis
The white dot in the middle of the black rectangle (IQR) represents the median, so the east overall has a higher median air quality than the west. The violin plot still shows that the range of the west is larger, but it does not show what values will be outliers as well as the boxplot does.
Histogram of Air Quality Data
===
```{r}
ggplot(PM, aes(x = pm25)) +
geom_histogram( fill = "blue", color = "black") +
geom_vline(aes(xintercept = mean(pm25, na.rm = TRUE)), color = "red", linetype = "dashed") +
labs(title = "Histogram of Air Quality Levels", x = "Air Quality", y = "Count") +
theme_minimal()
```
### Analysis
The histogram looks pretty symmetric but could be considered skewed right because of the values starting around air quality 13.
Histogram of Air Quality Data by Region
===
```{r}
ggplot(PM, aes(x = pm25)) +
geom_histogram(fill = "blue", color = "black") +
facet_wrap(~ region) +
labs(title = "Histogram of Air Quality by Region", x = "Air Quality", y = "Count")
```
### Analysis
The shape of the east histogram is taller because there were more data values for this region, and the shape is pretty symmetric. The west's histogram is shorter and skewed right.
Scatterplot of Air Quality vs Latitude
===
```{r}
ggplot(PM, aes(x = latitude, y = pm25)) +
geom_point(color = "blue") +
labs(title = "Scatterplot of Air Quality vs Latitude", x = "Latitude", y = "Air Quality")
```
Scatterplot of Air Quality vs Latitude by Region
```{r}
ggplot(PM, aes(x = latitude, y = pm25)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red") +
facet_wrap(~ region) +
labs(title = "Scatterplot of Air Quality vs Latitude by Region", x = "Latitude", y = "Air Quality")
```
### Analysis
There are more data points for the east and they're closer to each other than the west points, which are relatively spaced out.